A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
# To build linear model for statistical analysis and prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
# Library to split data
from sklearn.model_selection import train_test_split
# To get diferent metric scores
from sklearn import metrics
from sklearn.metrics import accuracy_score, roc_curve, confusion_matrix, roc_auc_score
# Decision Tree packages
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
%load_ext nb_black
The nb_black extension is already loaded. To reload it, use: %reload_ext nb_black
# Importing the Dataset
df = pd.read_csv("INNHotelsGroup.csv")
df.head(10) # Display first 10 rows of the dataset
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 5 | INN00006 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 346 | 2018 | 9 | 13 | Online | 0 | 0 | 0 | 115.00 | 1 | Canceled |
| 6 | INN00007 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2017 | 10 | 15 | Online | 0 | 0 | 0 | 107.55 | 1 | Not_Canceled |
| 7 | INN00008 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 83 | 2018 | 12 | 26 | Online | 0 | 0 | 0 | 105.61 | 1 | Not_Canceled |
| 8 | INN00009 | 3 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 121 | 2018 | 7 | 6 | Offline | 0 | 0 | 0 | 96.90 | 1 | Not_Canceled |
| 9 | INN00010 | 2 | 0 | 0 | 5 | Meal Plan 1 | 0 | Room_Type 4 | 44 | 2018 | 10 | 18 | Online | 0 | 0 | 0 | 133.44 | 3 | Not_Canceled |
# Checking for duplicate values
df.duplicated().sum()
0
df.shape # Displays the rows and columns of the dataset respectively
(36275, 19)
df.info() # Displays the datatypes of the dataset
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
df.isnull().sum() # Checking for null values
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
df.describe() # Displays Statistical Summary of the Data
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 |
| mean | 1.844962 | 0.105279 | 0.810724 | 2.204300 | 0.030986 | 85.232557 | 2017.820427 | 7.423653 | 15.596995 | 0.025637 | 0.023349 | 0.153411 | 103.423539 | 0.619655 |
| std | 0.518715 | 0.402648 | 0.870644 | 1.410905 | 0.173281 | 85.930817 | 0.383836 | 3.069894 | 8.740447 | 0.158053 | 0.368331 | 1.754171 | 35.089424 | 0.786236 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2017.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 17.000000 | 2018.000000 | 5.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 80.300000 | 0.000000 |
| 50% | 2.000000 | 0.000000 | 1.000000 | 2.000000 | 0.000000 | 57.000000 | 2018.000000 | 8.000000 | 16.000000 | 0.000000 | 0.000000 | 0.000000 | 99.450000 | 0.000000 |
| 75% | 2.000000 | 0.000000 | 2.000000 | 3.000000 | 0.000000 | 126.000000 | 2018.000000 | 10.000000 | 23.000000 | 0.000000 | 0.000000 | 0.000000 | 120.000000 | 1.000000 |
| max | 4.000000 | 10.000000 | 7.000000 | 17.000000 | 1.000000 | 443.000000 | 2018.000000 | 12.000000 | 31.000000 | 1.000000 | 13.000000 | 58.000000 | 540.000000 | 5.000000 |
Leading Questions:
# Creating a countplot of each month
plt.figure(figsize=(15, 5))
fig1 = sns.countplot(data=df, x="arrival_month")
plt.grid(axis="y", linewidth=0.5)
fig1.bar_label(fig1.containers[0], label_type="edge")
# Add data labels
[Text(0, 0, '1014'), Text(0, 0, '1704'), Text(0, 0, '2358'), Text(0, 0, '2736'), Text(0, 0, '2598'), Text(0, 0, '3203'), Text(0, 0, '2920'), Text(0, 0, '3813'), Text(0, 0, '4611'), Text(0, 0, '5317'), Text(0, 0, '2980'), Text(0, 0, '3021')]
# Creating a count plot of the different market segments
plt.figure(figsize=(15, 10))
fig2 = sns.countplot(data=df, x="market_segment_type", palette="mako")
plt.grid(axis="y", linewidth=0.5)
fig2.bar_label(fig2.containers[0], label_type="edge")
# Add data labels
[Text(0, 0, '10528'), Text(0, 0, '23214'), Text(0, 0, '2017'), Text(0, 0, '125'), Text(0, 0, '391')]
# Creating a Barplot of Median Room Price vs. Market Segment Type
fig3 = (
df.groupby("market_segment_type")["avg_price_per_room"]
.median(numeric_only=True)
.plot.bar(color="purple")
)
plt.grid(axis="y", linewidth=0.5)
plt.ylabel("Average Price per Room (in €)")
fig3.bar_label(fig3.containers[0], label_type="edge")
# Add data labels
[Text(0, 0, '95'), Text(0, 0, '0'), Text(0, 0, '79'), Text(0, 0, '90'), Text(0, 0, '107.1')]
# Calculating percentage (No of canceled bookings / Total bookings * 100)
df[df["booking_status"] == "Canceled"].shape[0] / df.shape[0] * 100
32.76361130254997
# Create a grouped dataset filtering the decimal percentage of cancelled/non-cancelled bookings by repeat/non-repeat guests
print(df.groupby("repeated_guest")["booking_status"].value_counts(1))
# As the cancelled booking decimal % for repeated guests is 0.0172, we know that only 1.72% of bookings are cancelled by repeat guests
repeated_guest booking_status
0 Not_Canceled 0.664196
Canceled 0.335804
1 Not_Canceled 0.982796
Canceled 0.017204
Name: booking_status, dtype: float64
df.head(2)
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
# Creating a countplot of booking status based on number of special requests
plt.figure(figsize=(15, 9))
fig4 = sns.countplot(
data=df, x="no_of_special_requests", hue="booking_status", palette="mako"
)
plt.grid(axis="y", linewidth=0.5)
# Creating a countplot of booking status based on if there was a special request
df["special_request"] = df["no_of_special_requests"] > 0
fig5 = sns.countplot(
data=df, x="special_request", hue="booking_status", palette="rocket"
)
-
# Checking for outliers
df.describe()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 | 36275.000000 |
| mean | 1.844962 | 0.105279 | 0.810724 | 2.204300 | 0.030986 | 85.232557 | 2017.820427 | 7.423653 | 15.596995 | 0.025637 | 0.023349 | 0.153411 | 103.423539 | 0.619655 |
| std | 0.518715 | 0.402648 | 0.870644 | 1.410905 | 0.173281 | 85.930817 | 0.383836 | 3.069894 | 8.740447 | 0.158053 | 0.368331 | 1.754171 | 35.089424 | 0.786236 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2017.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 17.000000 | 2018.000000 | 5.000000 | 8.000000 | 0.000000 | 0.000000 | 0.000000 | 80.300000 | 0.000000 |
| 50% | 2.000000 | 0.000000 | 1.000000 | 2.000000 | 0.000000 | 57.000000 | 2018.000000 | 8.000000 | 16.000000 | 0.000000 | 0.000000 | 0.000000 | 99.450000 | 0.000000 |
| 75% | 2.000000 | 0.000000 | 2.000000 | 3.000000 | 0.000000 | 126.000000 | 2018.000000 | 10.000000 | 23.000000 | 0.000000 | 0.000000 | 0.000000 | 120.000000 | 1.000000 |
| max | 4.000000 | 10.000000 | 7.000000 | 17.000000 | 1.000000 | 443.000000 | 2018.000000 | 12.000000 | 31.000000 | 1.000000 | 13.000000 | 58.000000 | 540.000000 | 5.000000 |
# Create a boxplot of number of previoous cancellations
fig6 = sns.boxplot(data=df, y="no_of_previous_cancellations")
# Create a countplot of number of previous cancellations
fig7 = sns.countplot(
data=df[df["no_of_previous_cancellations"] > 0],
x="no_of_previous_cancellations",
palette="rocket",
)
fig7.bar_label(fig7.containers[0], label_type="edge")
# Add data labels
df.shape # Check shape of dataset again
(36275, 20)
# Filtering the dataset to show the number of bookings with more than 10 previous cancellations
df[df["no_of_previous_cancellations"] > 10]
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | special_request | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1561 | INN01562 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 0 | 2018 | 1 | 14 | Online | 1 | 11 | 4 | 81.90 | 0 | Not_Canceled | False |
| 3322 | INN03323 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 5 | 2018 | 1 | 15 | Online | 1 | 11 | 5 | 89.30 | 1 | Not_Canceled | True |
| 3530 | INN03531 | 2 | 0 | 0 | 1 | Not Selected | 1 | Room_Type 1 | 2 | 2018 | 1 | 16 | Online | 1 | 11 | 10 | 86.00 | 1 | Not_Canceled | True |
| 7110 | INN07111 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 10 | 2018 | 2 | 13 | Online | 1 | 11 | 22 | 79.00 | 1 | Not_Canceled | True |
| 10686 | INN10687 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 1 | 2018 | 2 | 12 | Online | 1 | 11 | 22 | 79.00 | 1 | Not_Canceled | True |
| 10890 | INN10891 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 1 | 2018 | 1 | 15 | Online | 1 | 11 | 5 | 75.00 | 0 | Not_Canceled | False |
| 11834 | INN11835 | 2 | 0 | 0 | 2 | Not Selected | 1 | Room_Type 1 | 1 | 2018 | 1 | 8 | Online | 1 | 11 | 0 | 76.50 | 0 | Not_Canceled | False |
| 12097 | INN12098 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 190 | 2018 | 4 | 9 | Offline | 1 | 13 | 1 | 70.00 | 0 | Canceled | False |
| 12109 | INN12110 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 190 | 2018 | 4 | 9 | Offline | 1 | 13 | 1 | 70.00 | 0 | Canceled | False |
| 12554 | INN12555 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 3 | 2018 | 1 | 14 | Online | 1 | 11 | 4 | 83.90 | 1 | Not_Canceled | True |
| 14030 | INN14031 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 1 | 2018 | 1 | 8 | Online | 1 | 11 | 0 | 67.50 | 0 | Not_Canceled | False |
| 16277 | INN16278 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 190 | 2018 | 4 | 9 | Offline | 1 | 13 | 1 | 70.00 | 0 | Canceled | False |
| 16919 | INN16920 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2018 | 1 | 12 | Online | 1 | 11 | 4 | 82.90 | 1 | Not_Canceled | True |
| 19779 | INN19780 | 3 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 8 | 2018 | 2 | 10 | Online | 1 | 11 | 22 | 92.67 | 2 | Not_Canceled | True |
| 20739 | INN20740 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 1 | 14 | Online | 1 | 11 | 4 | 67.22 | 0 | Not_Canceled | False |
| 23792 | INN23793 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 5 | 2018 | 1 | 14 | Online | 1 | 11 | 4 | 73.90 | 1 | Not_Canceled | True |
| 24950 | INN24951 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 0 | 2018 | 1 | 17 | Online | 1 | 11 | 16 | 93.00 | 0 | Not_Canceled | False |
| 27499 | INN27500 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 1 | 2018 | 1 | 9 | Online | 1 | 11 | 1 | 89.30 | 1 | Not_Canceled | True |
| 28891 | INN28892 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 4 | 2018 | 1 | 15 | Online | 1 | 11 | 5 | 77.00 | 1 | Not_Canceled | True |
| 28914 | INN28915 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 1 | 2018 | 1 | 29 | Online | 1 | 11 | 20 | 106.00 | 1 | Not_Canceled | True |
| 28972 | INN28973 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 1 | 15 | Online | 1 | 11 | 5 | 73.90 | 1 | Not_Canceled | True |
| 30363 | INN30364 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 5 | 2018 | 1 | 22 | Online | 1 | 11 | 19 | 89.00 | 1 | Not_Canceled | True |
| 30833 | INN30834 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 190 | 2018 | 4 | 9 | Offline | 1 | 13 | 1 | 70.00 | 0 | Canceled | False |
| 32148 | INN32149 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2018 | 1 | 13 | Online | 1 | 11 | 4 | 77.50 | 0 | Not_Canceled | False |
| 32722 | INN32723 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 12 | 2018 | 2 | 5 | Online | 1 | 11 | 21 | 108.00 | 1 | Not_Canceled | True |
| 33760 | INN33761 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 8 | 2018 | 1 | 15 | Online | 1 | 11 | 5 | 80.30 | 1 | Not_Canceled | True |
| 34906 | INN34907 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2018 | 1 | 12 | Online | 1 | 11 | 4 | 80.30 | 1 | Not_Canceled | True |
| 34909 | INN34910 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 3 | 2018 | 1 | 16 | Online | 1 | 11 | 10 | 80.10 | 1 | Not_Canceled | True |
| 36079 | INN36080 | 1 | 0 | 0 | 1 | Not Selected | 1 | Room_Type 1 | 0 | 2018 | 1 | 7 | Online | 1 | 11 | 0 | 77.50 | 0 | Not_Canceled | False |
# Removing the outliers from the dataset
df = df[df["no_of_previous_cancellations"] < 10]
df.shape # Checking shape
(36246, 20)
# Creating a Duplicate dataframe for regression model and Dropping Booking_ID and Special Request Columns
df_reg = df
df_reg = df_reg.drop(["special_request"], axis=1)
df_reg = df_reg.drop(["Booking_ID"], axis=1)
print(df_reg.shape)
df_reg = pd.get_dummies(df_reg, columns=["type_of_meal_plan"], drop_first=True)
df_reg = pd.get_dummies(df_reg, columns=["room_type_reserved"], drop_first=True)
df_reg = pd.get_dummies(df_reg, columns=["market_segment_type"], drop_first=True)
df_reg.shape
(36246, 18)
(36246, 28)
# Replacing booking status strings with either 0 or 1 integers
df_reg = df_reg.replace("Not_Canceled", 0)
df_reg = df_reg.replace("Canceled", 1)
df_reg.info() # Displays all datatypes
<class 'pandas.core.frame.DataFrame'> Int64Index: 36246 entries, 0 to 36274 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36246 non-null int64 1 no_of_children 36246 non-null int64 2 no_of_weekend_nights 36246 non-null int64 3 no_of_week_nights 36246 non-null int64 4 required_car_parking_space 36246 non-null int64 5 lead_time 36246 non-null int64 6 arrival_year 36246 non-null int64 7 arrival_month 36246 non-null int64 8 arrival_date 36246 non-null int64 9 repeated_guest 36246 non-null int64 10 no_of_previous_cancellations 36246 non-null int64 11 no_of_previous_bookings_not_canceled 36246 non-null int64 12 avg_price_per_room 36246 non-null float64 13 no_of_special_requests 36246 non-null int64 14 booking_status 36246 non-null int64 15 type_of_meal_plan_Meal Plan 2 36246 non-null uint8 16 type_of_meal_plan_Meal Plan 3 36246 non-null uint8 17 type_of_meal_plan_Not Selected 36246 non-null uint8 18 room_type_reserved_Room_Type 2 36246 non-null uint8 19 room_type_reserved_Room_Type 3 36246 non-null uint8 20 room_type_reserved_Room_Type 4 36246 non-null uint8 21 room_type_reserved_Room_Type 5 36246 non-null uint8 22 room_type_reserved_Room_Type 6 36246 non-null uint8 23 room_type_reserved_Room_Type 7 36246 non-null uint8 24 market_segment_type_Complementary 36246 non-null uint8 25 market_segment_type_Corporate 36246 non-null uint8 26 market_segment_type_Offline 36246 non-null uint8 27 market_segment_type_Online 36246 non-null uint8 dtypes: float64(1), int64(14), uint8(13) memory usage: 4.9 MB
X = df_reg.drop(["booking_status"], axis=1)
Y = df_reg["booking_status"]
# adding a contstant to X variable
X = add_constant(X)
# creating dummies
X = pd.get_dummies(X, drop_first=True)
# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1, stratify=Y
)
# Checking to see if the dummy variables have been created properly, there will only be 2 unique values (1/0) in each dummy variable
df_reg.nunique()
no_of_adults 5 no_of_children 6 no_of_weekend_nights 8 no_of_week_nights 18 required_car_parking_space 2 lead_time 352 arrival_year 2 arrival_month 12 arrival_date 31 repeated_guest 2 no_of_previous_cancellations 7 no_of_previous_bookings_not_canceled 59 avg_price_per_room 3930 no_of_special_requests 6 booking_status 2 type_of_meal_plan_Meal Plan 2 2 type_of_meal_plan_Meal Plan 3 2 type_of_meal_plan_Not Selected 2 room_type_reserved_Room_Type 2 2 room_type_reserved_Room_Type 3 2 room_type_reserved_Room_Type 4 2 room_type_reserved_Room_Type 5 2 room_type_reserved_Room_Type 6 2 room_type_reserved_Room_Type 7 2 market_segment_type_Complementary 2 market_segment_type_Corporate 2 market_segment_type_Offline 2 market_segment_type_Online 2 dtype: int64
# Output the Rows and Columns of the dataset respectively
df_reg.shape
(36246, 28)
# Checking statistical summary of altered dataset after removing outliers
df_reg["no_of_previous_cancellations"].describe()
count 36246.000000 mean 0.014346 std 0.184407 min 0.000000 25% 0.000000 50% 0.000000 75% 0.000000 max 6.000000 Name: no_of_previous_cancellations, dtype: float64
# Making a duplicate dataset and replacing strings with binary values to show direct correlation trends in heatmap
df_hm = df.replace("Not_Canceled", 0)
df_hm = df_hm.replace("Canceled", 1)
df_hm
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | special_request | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | 0 | False |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | 0 | True |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | 1 | False |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | 1 | False |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | 1 | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1 | 0 | True |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2 | 1 | True |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | 0 | True |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | 1 | False |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | 0 | False |
36246 rows × 20 columns
# Creating a Heatmap
plt.figure(figsize=(17, 12))
sns.heatmap(df_hm.corr(numeric_only=True), annot=True, cmap="Spectral", vmin=-1, vmax=1)
plt.show()
# let's check the VIF of the predictors
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 3.946254e+07 no_of_adults 1.340447e+00 no_of_children 1.924150e+00 no_of_weekend_nights 1.070664e+00 no_of_week_nights 1.097704e+00 required_car_parking_space 1.036525e+00 lead_time 1.393533e+00 arrival_year 1.437135e+00 arrival_month 1.270359e+00 arrival_date 1.007380e+00 repeated_guest 1.719460e+00 no_of_previous_cancellations 2.180645e+00 no_of_previous_bookings_not_canceled 2.344258e+00 avg_price_per_room 2.036414e+00 no_of_special_requests 1.250788e+00 type_of_meal_plan_Meal Plan 2 1.265350e+00 type_of_meal_plan_Meal Plan 3 1.025209e+00 type_of_meal_plan_Not Selected 1.275806e+00 room_type_reserved_Room_Type 2 1.092112e+00 room_type_reserved_Room_Type 3 1.001197e+00 room_type_reserved_Room_Type 4 1.363580e+00 room_type_reserved_Room_Type 5 1.031790e+00 room_type_reserved_Room_Type 6 1.931982e+00 room_type_reserved_Room_Type 7 1.123504e+00 market_segment_type_Complementary 4.189398e+00 market_segment_type_Corporate 1.535340e+01 market_segment_type_Offline 5.783434e+01 market_segment_type_Online 6.408100e+01 dtype: float64
# Dropping market_segment_type_Online from the training data
X_train = X_train.drop(["market_segment_type_Online"], axis=1)
# let's check the VIF of the predictors again
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 3.937402e+07 no_of_adults 1.322499e+00 no_of_children 1.923254e+00 no_of_weekend_nights 1.070151e+00 no_of_week_nights 1.096666e+00 required_car_parking_space 1.036525e+00 lead_time 1.388523e+00 arrival_year 1.434248e+00 arrival_month 1.269288e+00 arrival_date 1.007372e+00 repeated_guest 1.717102e+00 no_of_previous_cancellations 2.180610e+00 no_of_previous_bookings_not_canceled 2.343721e+00 avg_price_per_room 2.035541e+00 no_of_special_requests 1.245677e+00 type_of_meal_plan_Meal Plan 2 1.264939e+00 type_of_meal_plan_Meal Plan 3 1.025209e+00 type_of_meal_plan_Not Selected 1.273682e+00 room_type_reserved_Room_Type 2 1.091935e+00 room_type_reserved_Room_Type 3 1.001197e+00 room_type_reserved_Room_Type 4 1.358342e+00 room_type_reserved_Room_Type 5 1.031790e+00 room_type_reserved_Room_Type 6 1.931657e+00 room_type_reserved_Room_Type 7 1.123356e+00 market_segment_type_Complementary 1.360896e+00 market_segment_type_Corporate 1.519875e+00 market_segment_type_Offline 1.610215e+00 dtype: float64
# Dropping market_segment_type_Online from the training data
X_train = X_train.drop(["market_segment_type_Offline"], axis=1)
# let's check the VIF of the predictors again
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 3.923399e+07 no_of_adults 1.322200e+00 no_of_children 1.919913e+00 no_of_weekend_nights 1.064620e+00 no_of_week_nights 1.094575e+00 required_car_parking_space 1.035138e+00 lead_time 1.366900e+00 arrival_year 1.429207e+00 arrival_month 1.268399e+00 arrival_date 1.007356e+00 repeated_guest 1.716802e+00 no_of_previous_cancellations 2.180419e+00 no_of_previous_bookings_not_canceled 2.343211e+00 avg_price_per_room 1.947482e+00 no_of_special_requests 1.145572e+00 type_of_meal_plan_Meal Plan 2 1.205473e+00 type_of_meal_plan_Meal Plan 3 1.025185e+00 type_of_meal_plan_Not Selected 1.173885e+00 room_type_reserved_Room_Type 2 1.080601e+00 room_type_reserved_Room_Type 3 1.001191e+00 room_type_reserved_Room_Type 4 1.336218e+00 room_type_reserved_Room_Type 5 1.030384e+00 room_type_reserved_Room_Type 6 1.931338e+00 room_type_reserved_Room_Type 7 1.122559e+00 market_segment_type_Complementary 1.326263e+00 market_segment_type_Corporate 1.419882e+00 dtype: float64
# Dropping market_segment_type_Online from the training data
X_train = X_train.drop(["no_of_previous_bookings_not_canceled"], axis=1)
# let's check the VIF of the predictors again
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("VIF values: \n\n{}\n".format(vif_series))
VIF values: const 3.905665e+07 no_of_adults 1.321707e+00 no_of_children 1.919873e+00 no_of_weekend_nights 1.064568e+00 no_of_week_nights 1.094574e+00 required_car_parking_space 1.034809e+00 lead_time 1.366892e+00 arrival_year 1.422746e+00 arrival_month 1.268399e+00 arrival_date 1.007197e+00 repeated_guest 1.632303e+00 no_of_previous_cancellations 1.303820e+00 avg_price_per_room 1.946136e+00 no_of_special_requests 1.139100e+00 type_of_meal_plan_Meal Plan 2 1.204575e+00 type_of_meal_plan_Meal Plan 3 1.025158e+00 type_of_meal_plan_Not Selected 1.173859e+00 room_type_reserved_Room_Type 2 1.080598e+00 room_type_reserved_Room_Type 3 1.001188e+00 room_type_reserved_Room_Type 4 1.336105e+00 room_type_reserved_Room_Type 5 1.030096e+00 room_type_reserved_Room_Type 6 1.931033e+00 room_type_reserved_Room_Type 7 1.121675e+00 market_segment_type_Complementary 1.320492e+00 market_segment_type_Corporate 1.408832e+00 dtype: float64
# fitting the model on training set
logit = sm.Logit(y_train, X_train.astype(float))
lg = logit.fit(maxiter=200)
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.452066
Iterations: 200
C:\Users\yatik\anaconda3\lib\site-packages\statsmodels\base\model.py:604: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
# let's print the logistic regression summary
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25372
Model: Logit Df Residuals: 25347
Method: MLE Df Model: 24
Date: Fri, 23 Feb 2024 Pseudo R-squ.: 0.2854
Time: 17:26:06 Log-Likelihood: -11470.
converged: False LL-Null: -16051.
Covariance Type: nonrobust LLR p-value: 0.000
=====================================================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------------------------
const -1134.0547 117.655 -9.639 0.000 -1364.654 -903.456
no_of_adults 0.1159 0.036 3.238 0.001 0.046 0.186
no_of_children 0.2320 0.054 4.278 0.000 0.126 0.338
no_of_weekend_nights 0.1715 0.019 9.003 0.000 0.134 0.209
no_of_week_nights 0.0757 0.012 6.392 0.000 0.052 0.099
required_car_parking_space -1.2532 0.132 -9.486 0.000 -1.512 -0.994
lead_time 0.0134 0.000 55.586 0.000 0.013 0.014
arrival_year 0.5601 0.058 9.605 0.000 0.446 0.674
arrival_month -0.0496 0.006 -7.889 0.000 -0.062 -0.037
arrival_date 0.0028 0.002 1.481 0.139 -0.001 0.006
repeated_guest -2.8340 0.627 -4.520 0.000 -4.063 -1.605
no_of_previous_cancellations 0.4037 0.426 0.948 0.343 -0.431 1.238
avg_price_per_room 0.0221 0.001 30.786 0.000 0.021 0.023
no_of_special_requests -1.1795 0.027 -43.183 0.000 -1.233 -1.126
type_of_meal_plan_Meal Plan 2 -0.3495 0.062 -5.623 0.000 -0.471 -0.228
type_of_meal_plan_Meal Plan 3 26.7609 8.24e+05 3.25e-05 1.000 -1.61e+06 1.61e+06
type_of_meal_plan_Not Selected 0.7610 0.050 15.369 0.000 0.664 0.858
room_type_reserved_Room_Type 2 0.2078 0.125 1.667 0.096 -0.037 0.452
room_type_reserved_Room_Type 3 -0.0536 1.115 -0.048 0.962 -2.238 2.131
room_type_reserved_Room_Type 4 0.0585 0.050 1.161 0.246 -0.040 0.157
room_type_reserved_Room_Type 5 -1.0446 0.209 -5.008 0.000 -1.453 -0.636
room_type_reserved_Room_Type 6 -0.9570 0.141 -6.769 0.000 -1.234 -0.680
room_type_reserved_Room_Type 7 -1.6868 0.311 -5.419 0.000 -2.297 -1.077
market_segment_type_Complementary -33.2764 8.24e+05 -4.04e-05 1.000 -1.61e+06 1.61e+06
market_segment_type_Corporate -0.0773 0.099 -0.777 0.437 -0.272 0.118
=====================================================================================================
# Checking Accuracy
pred_train = lg.predict(X_train) > 0.5
pred_train = np.round(pred_train)
print("Accuracy on training set : ", accuracy_score(y_train, pred_train))
Accuracy on training set : 0.7880734668138105
# [1] Removing variables with high P values
X_train1 = X_train.drop(["market_segment_type_Complementary"], axis=1)
# Fitting the model after removing the variable
logit1 = sm.Logit(y_train, X_train1.astype(float))
lg1 = logit1.fit()
# Checking Accuracy
pred_train1 = lg1.predict(X_train1)
pred_train1 = np.round(pred_train1)
print("Accuracy on training set : ", accuracy_score(y_train, pred_train1))
Optimization terminated successfully.
Current function value: 0.452297
Iterations 12
Accuracy on training set : 0.7880340532870881
# let's print the logistic regression summary
print(lg1.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25372
Model: Logit Df Residuals: 25348
Method: MLE Df Model: 23
Date: Fri, 23 Feb 2024 Pseudo R-squ.: 0.2850
Time: 17:31:40 Log-Likelihood: -11476.
converged: True LL-Null: -16051.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -1123.8341 117.581 -9.558 0.000 -1354.289 -893.379
no_of_adults 0.1171 0.036 3.272 0.001 0.047 0.187
no_of_children 0.2307 0.054 4.255 0.000 0.124 0.337
no_of_weekend_nights 0.1717 0.019 9.015 0.000 0.134 0.209
no_of_week_nights 0.0765 0.012 6.459 0.000 0.053 0.100
required_car_parking_space -1.2567 0.132 -9.513 0.000 -1.516 -0.998
lead_time 0.0134 0.000 55.743 0.000 0.013 0.014
arrival_year 0.5550 0.058 9.524 0.000 0.441 0.669
arrival_month -0.0502 0.006 -7.987 0.000 -0.063 -0.038
arrival_date 0.0027 0.002 1.453 0.146 -0.001 0.006
repeated_guest -2.8400 0.626 -4.534 0.000 -4.067 -1.612
no_of_previous_cancellations 0.4059 0.425 0.955 0.340 -0.427 1.239
avg_price_per_room 0.0223 0.001 31.269 0.000 0.021 0.024
no_of_special_requests -1.1801 0.027 -43.207 0.000 -1.234 -1.127
type_of_meal_plan_Meal Plan 2 -0.3546 0.062 -5.706 0.000 -0.476 -0.233
type_of_meal_plan_Meal Plan 3 2.0237 2.695 0.751 0.453 -3.258 7.305
type_of_meal_plan_Not Selected 0.7645 0.050 15.444 0.000 0.667 0.862
room_type_reserved_Room_Type 2 0.2093 0.125 1.679 0.093 -0.035 0.454
room_type_reserved_Room_Type 3 -0.0737 1.105 -0.067 0.947 -2.239 2.092
room_type_reserved_Room_Type 4 0.0543 0.050 1.079 0.281 -0.044 0.153
room_type_reserved_Room_Type 5 -1.0516 0.209 -5.043 0.000 -1.460 -0.643
room_type_reserved_Room_Type 6 -0.9687 0.141 -6.856 0.000 -1.246 -0.692
room_type_reserved_Room_Type 7 -1.7108 0.311 -5.503 0.000 -2.320 -1.101
market_segment_type_Corporate -0.0732 0.099 -0.735 0.462 -0.268 0.122
==================================================================================================
# [2] Removing variables with high P values
X_train2 = X_train1.drop(["market_segment_type_Corporate"], axis=1)
# fitting the model on training set
logit2 = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit2.fit()
pred_train2 = lg2.predict(X_train2)
pred_train2 = np.round(pred_train2)
print("Accuracy on training set : ", accuracy_score(y_train, pred_train2))
Optimization terminated successfully.
Current function value: 0.452308
Iterations 12
Accuracy on training set : 0.7893346996689263
# let's print the logistic regression summary
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25372
Model: Logit Df Residuals: 25349
Method: MLE Df Model: 22
Date: Fri, 23 Feb 2024 Pseudo R-squ.: 0.2850
Time: 17:33:24 Log-Likelihood: -11476.
converged: True LL-Null: -16051.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -1123.7810 117.566 -9.559 0.000 -1354.207 -893.355
no_of_adults 0.1212 0.035 3.426 0.001 0.052 0.191
no_of_children 0.2322 0.054 4.286 0.000 0.126 0.338
no_of_weekend_nights 0.1727 0.019 9.095 0.000 0.136 0.210
no_of_week_nights 0.0768 0.012 6.485 0.000 0.054 0.100
required_car_parking_space -1.2578 0.132 -9.520 0.000 -1.517 -0.999
lead_time 0.0134 0.000 56.306 0.000 0.013 0.014
arrival_year 0.5549 0.058 9.524 0.000 0.441 0.669
arrival_month -0.0502 0.006 -7.986 0.000 -0.063 -0.038
arrival_date 0.0027 0.002 1.426 0.154 -0.001 0.006
repeated_guest -2.8658 0.622 -4.611 0.000 -4.084 -1.648
no_of_previous_cancellations 0.4019 0.421 0.953 0.340 -0.424 1.228
avg_price_per_room 0.0223 0.001 31.334 0.000 0.021 0.024
no_of_special_requests -1.1796 0.027 -43.190 0.000 -1.233 -1.126
type_of_meal_plan_Meal Plan 2 -0.3528 0.062 -5.680 0.000 -0.475 -0.231
type_of_meal_plan_Meal Plan 3 2.0312 2.699 0.753 0.452 -3.259 7.321
type_of_meal_plan_Not Selected 0.7687 0.049 15.630 0.000 0.672 0.865
room_type_reserved_Room_Type 2 0.2122 0.125 1.703 0.088 -0.032 0.456
room_type_reserved_Room_Type 3 -0.0777 1.102 -0.070 0.944 -2.238 2.083
room_type_reserved_Room_Type 4 0.0548 0.050 1.088 0.276 -0.044 0.153
room_type_reserved_Room_Type 5 -1.0679 0.208 -5.144 0.000 -1.475 -0.661
room_type_reserved_Room_Type 6 -0.9700 0.141 -6.865 0.000 -1.247 -0.693
room_type_reserved_Room_Type 7 -1.7144 0.311 -5.516 0.000 -2.324 -1.105
==================================================================================================
# [3-8] Removing variables with high P values
X_trainF = X_train2.drop(
[
"room_type_reserved_Room_Type 2",
"room_type_reserved_Room_Type 3",
"room_type_reserved_Room_Type 4",
"type_of_meal_plan_Meal Plan 3",
"no_of_previous_cancellations",
"arrival_date",
],
axis=1,
)
# fitting the model on training set
logitF = sm.Logit(y_train, X_trainF.astype(float))
lgF = logitF.fit()
pred_trainF = lgF.predict(X_trainF)
pred_trainF = np.round(pred_trainF)
print("Accuracy on training set : ", accuracy_score(y_train, pred_trainF))
Optimization terminated successfully.
Current function value: 0.452456
Iterations 11
Accuracy on training set : 0.7872457827526407
## checking summary of the model
print(lg8.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25372
Model: Logit Df Residuals: 25355
Method: MLE Df Model: 16
Date: Fri, 23 Feb 2024 Pseudo R-squ.: 0.2848
Time: 17:58:37 Log-Likelihood: -11480.
converged: True LL-Null: -16051.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -1125.1263 117.459 -9.579 0.000 -1355.342 -894.911
no_of_adults 0.1293 0.034 3.762 0.000 0.062 0.197
no_of_children 0.2498 0.053 4.742 0.000 0.147 0.353
no_of_weekend_nights 0.1746 0.019 9.208 0.000 0.137 0.212
no_of_week_nights 0.0780 0.012 6.631 0.000 0.055 0.101
required_car_parking_space -1.2541 0.132 -9.509 0.000 -1.513 -0.996
lead_time 0.0134 0.000 56.505 0.000 0.013 0.014
arrival_year 0.5556 0.058 9.545 0.000 0.442 0.670
arrival_month -0.0511 0.006 -8.154 0.000 -0.063 -0.039
repeated_guest -2.5478 0.457 -5.580 0.000 -3.443 -1.653
avg_price_per_room 0.0225 0.001 33.119 0.000 0.021 0.024
no_of_special_requests -1.1762 0.027 -43.169 0.000 -1.230 -1.123
type_of_meal_plan_Meal Plan 2 -0.3637 0.062 -5.906 0.000 -0.484 -0.243
type_of_meal_plan_Not Selected 0.7541 0.048 15.725 0.000 0.660 0.848
room_type_reserved_Room_Type 5 -1.0882 0.207 -5.265 0.000 -1.493 -0.683
room_type_reserved_Room_Type 6 -1.0304 0.138 -7.475 0.000 -1.301 -0.760
room_type_reserved_Room_Type 7 -1.7674 0.309 -5.717 0.000 -2.373 -1.161
==================================================================================================
cm = confusion_matrix(y_train, pred_train)
sns.heatmap(cm, annot=True, fmt="g")
plt.xlabel("Predicted Values")
plt.ylabel("Actual Values")
plt.show()
print("Accuracy on training set : ", accuracy_score(y_train, pred_trainF))
Accuracy on training set : 0.7872457827526407
ROC-AUC on training set
logit_roc_auc_train = roc_auc_score(y_train, lgF.predict(X_trainF))
fpr, tpr, thresholds = roc_curve(y_train, lgF.predict(X_trainF))
plt.figure(figsize=(5, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
Odds from coefficients
# converting coefficients to odds
odds = np.exp(lgF.params)
# adding the odds to a dataframe
pd.DataFrame(odds, X_trainF.columns, columns=["odds"]).T
| const | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | repeated_guest | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| odds | 0.0 | 1.13802 | 1.283717 | 1.190757 | 1.081168 | 0.285318 | 1.013475 | 1.743028 | 0.950145 | 0.078257 | 1.022722 | 0.308439 | 0.695081 | 2.125619 | 0.33683 | 0.356856 | 0.170782 |
# finding the percentage change
perc_change_odds = (np.exp(lgF.params) - 1) * 100
# adding the change_odds% to a dataframe
pd.DataFrame(perc_change_odds, X_trainF.columns, columns=["change_odds%"]).T
| const | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | repeated_guest | avg_price_per_room | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| change_odds% | -100.0 | 13.802024 | 28.371748 | 19.075744 | 8.116821 | -71.46816 | 1.347539 | 74.302765 | -4.985505 | -92.174252 | 2.272203 | -69.156143 | -30.491853 | 112.561887 | -66.316971 | -64.314359 | -82.921847 |
Looking closely at the main contributors that affect cancellation odds (>50%) and hassuming all other variables are held constant:
# dropping variables from test set as well which were dropped from training set
X_test = X_test.drop(
[
"market_segment_type_Online",
"market_segment_type_Offline",
"no_of_previous_bookings_not_canceled",
"market_segment_type_Complementary",
"market_segment_type_Corporate",
"room_type_reserved_Room_Type 2",
"room_type_reserved_Room_Type 3",
"room_type_reserved_Room_Type 4",
"type_of_meal_plan_Meal Plan 3",
"no_of_previous_cancellations",
"arrival_date",
],
axis=1,
)
pred_test = lgF.predict(X_test) > 0.5
pred_test = np.round(pred_test)
print("Accuracy on training set : ", accuracy_score(y_train, pred_trainF))
print("Accuracy on test set : ", accuracy_score(y_test, pred_test))
Accuracy on training set : 0.7872457827526407 Accuracy on test set : 0.789313959904359
# Splitting the data into train and test
XD = df_reg.drop(["booking_status"], axis=1)
YD = df_reg["booking_status"]
# Splitting data in train and test sets
XD_train, XD_test, yD_train, yD_test = train_test_split(
XD, YD, test_size=0.30, random_state=1
)
dTree = DecisionTreeClassifier(criterion="gini", random_state=1)
dTree.fit(XD_train, yD_train)
DecisionTreeClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(random_state=1)
# Scoring the model
print("Accuracy on training set : ", dTree.score(XD_train, yD_train))
print("Accuracy on test set : ", dTree.score(XD_test, yD_test))
Accuracy on training set : 0.9941667980450891 Accuracy on test set : 0.8738274783888174
## Function to create confusion matrix
def make_confusion_matrix(model, yD_actual, labels=[1, 0]):
"""
model : classifier to predict values of X
y_actual : ground truth
"""
yD_predict = model.predict(XD_test)
cm = metrics.confusion_matrix(yD_actual, yD_predict, labels=[0, 1])
df_cm = pd.DataFrame(
cm,
index=[i for i in ["Actual - No", "Actual - Yes"]],
columns=[i for i in ["Predicted - No", "Predicted - Yes"]],
)
group_counts = ["{0:0.0f}".format(value) for value in cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in cm.flatten() / np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in zip(group_counts, group_percentages)]
labels = np.asarray(labels).reshape(2, 2)
plt.figure(figsize=(10, 7))
sns.heatmap(df_cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
## Function to calculate recall score
def get_recall_score(model):
"""
model : classifier to predict values of X
"""
pred_train = model.predict(XD_train)
pred_test = model.predict(XD_test)
print("Recall on training set : ", metrics.recall_score(yD_train, pred_train))
print("Recall on test set : ", metrics.recall_score(yD_test, pred_test))
make_confusion_matrix(dTree, yD_test)
# Recall on train and test
get_recall_score(dTree)
Recall on training set : 0.9865916437208189 Recall on test set : 0.8117913832199547
# Displaying the features used and visualizing the unpruned decision tree
feature_names = list(XD.columns)
print(feature_names)
plt.figure(figsize=(50, 60))
tree.plot_tree(
dTree,
feature_names=feature_names,
filled=True,
fontsize=7,
node_ids=True,
class_names=True,
)
plt.show()
['no_of_adults', 'no_of_children', 'no_of_weekend_nights', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
# Display rules of the decision tree
print(tree.export_text(dTree, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- avg_price_per_room <= 215.61 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time <= 16.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 162.53 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 162.53 | | | | | | | | | | |--- avg_price_per_room <= 169.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 169.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- arrival_date <= 29.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_date > 29.00 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 10.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 10.00 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- lead_time > 16.50 | | | | | | | | |--- avg_price_per_room <= 135.00 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 115.35 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- avg_price_per_room > 115.35 | | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [31.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 135.00 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 3 <= 0.50 | | | | | | | | |--- weights: [1638.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 3 > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- lead_time <= 65.50 | | | | | | | |--- no_of_weekend_nights <= 4.50 | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- lead_time <= 59.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- lead_time > 59.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- lead_time <= 2.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 2.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 24 | | | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | |--- no_of_weekend_nights > 4.50 | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | |--- lead_time > 65.50 | | | | | | | |--- avg_price_per_room <= 99.98 | | | | | | | | |--- lead_time <= 75.50 | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 75.50 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- lead_time <= 87.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 87.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- avg_price_per_room > 99.98 | | | | | | | | |--- lead_time <= 81.00 | | | | | | | | | |--- avg_price_per_room <= 123.25 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 123.25 | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 81.00 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- avg_price_per_room <= 110.62 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 110.62 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 215.61 | | | | | |--- no_of_children <= 0.50 | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | |--- no_of_children > 0.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 116.50 | | | | | |--- avg_price_per_room <= 91.22 | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | | |--- arrival_date <= 8.00 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 8.00 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 83.50 | | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 83.50 | | | | | | | | | | | |--- weights: [9.00, 2.00] class: 0 | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- weights: [56.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- arrival_month <= 10.00 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 10.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_date > 11.50 | | | | | | | | |--- avg_price_per_room <= 64.25 | | | | | | | | | |--- lead_time <= 114.50 | | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 114.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 64.25 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- avg_price_per_room <= 65.35 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 65.35 | | | | | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | |--- avg_price_per_room > 91.22 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | |--- no_of_week_nights <= 1.00 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.00 | | | | | | | | | |--- lead_time <= 105.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 105.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 12.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 138.70 | | | | | | | | | | |--- weights: [0.00, 65.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 138.70 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | |--- lead_time <= 104.00 | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | |--- weights: [47.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | |--- arrival_date <= 23.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 23.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time > 104.00 | | | | | | | | | |--- avg_price_per_room <= 110.86 | | | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 110.86 | | | | | | | | | | |--- lead_time <= 111.00 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 111.00 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | |--- lead_time > 116.50 | | | | | |--- no_of_week_nights <= 0.50 | | | | | | |--- avg_price_per_room <= 92.50 | | | | | | | |--- arrival_date <= 23.00 | | | | | | | | |--- weights: [5.00, 2.00] class: 0 | | | | | | | |--- arrival_date > 23.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 92.50 | | | | | | | |--- arrival_date <= 11.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 11.00 | | | | | | | | |--- weights: [0.00, 21.00] class: 1 | | | | | |--- no_of_week_nights > 0.50 | | | | | | |--- no_of_adults <= 1.50 | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | |--- weights: [124.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 27.50 | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 1.50 | | | | | | | |--- lead_time <= 125.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- lead_time <= 124.50 | | | | | | | | | | |--- lead_time <= 121.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 121.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 124.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- lead_time <= 120.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 120.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- avg_price_per_room <= 76.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 76.50 | | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | |--- lead_time > 125.50 | | | | | | | | |--- avg_price_per_room <= 93.26 | | | | | | | | | |--- lead_time <= 149.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 149.50 | | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 93.26 | | | | | | | | | |--- avg_price_per_room <= 96.45 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- weights: [14.00, 2.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 96.45 | | | | | | | | | | |--- avg_price_per_room <= 118.78 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 118.78 | | | | | | | | | | | |--- truncated branch of depth 2 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 20.50 | | | | |--- lead_time <= 3.50 | | | | | |--- avg_price_per_room <= 202.67 | | | | | | |--- arrival_month <= 5.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [62.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | |--- arrival_month <= 3.00 | | | | | | | | | | |--- no_of_week_nights <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_week_nights > 5.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 3.00 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | |--- arrival_month > 5.50 | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- avg_price_per_room <= 136.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 136.50 | | | | | | | | | | |--- avg_price_per_room <= 140.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 140.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- avg_price_per_room <= 94.66 | | | | | | | | | | |--- weights: [73.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 94.66 | | | | | | | | | | |--- avg_price_per_room <= 95.05 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 95.05 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_per_room > 202.67 | | | | | | |--- arrival_month <= 11.00 | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | |--- arrival_month > 11.00 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- lead_time > 3.50 | | | | | |--- avg_price_per_room <= 108.20 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [80.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- lead_time <= 8.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 8.50 | | | | | | | | | | |--- avg_price_per_room <= 67.61 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 67.61 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [148.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 108.20 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 195.33 | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 111.67 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 111.67 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 195.33 | | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [33.00, 0.00] class: 0 | | | |--- lead_time > 20.50 | | | | |--- avg_price_per_room <= 105.28 | | | | | |--- avg_price_per_room <= 60.99 | | | | | | |--- avg_price_per_room <= 29.10 | | | | | | | |--- weights: [26.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 29.10 | | | | | | | |--- avg_price_per_room <= 41.62 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- avg_price_per_room > 41.62 | | | | | | | | |--- lead_time <= 84.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 84.50 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | |--- avg_price_per_room > 60.99 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- lead_time <= 49.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- lead_time > 49.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 25.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 25.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 71.92 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 71.92 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 85.13 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 85.13 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [0.00, 38.00] class: 1 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 94.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 94.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- lead_time <= 76.50 | | | | | | | | | | | |--- weights: [0.00, 21.00] class: 1 | | | | | | | | | | |--- lead_time > 76.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 4.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 104.88 | | | | | | | | | | |--- avg_price_per_room <= 74.45 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 74.45 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- avg_price_per_room > 104.88 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 105.28 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- lead_time <= 74.50 | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | |--- avg_price_per_room <= 121.12 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 121.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 74.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- avg_price_per_room <= 195.19 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 192.92 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | |--- avg_price_per_room > 192.92 | | | | | | | | | | |--- lead_time <= 36.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 36.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- avg_price_per_room > 195.19 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- weights: [0.00, 65.00] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- lead_time <= 48.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 48.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- no_of_week_nights <= 9.00 | | | | | | | |--- weights: [34.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 9.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_weekend_nights <= 2.50 | | | | | |--- avg_price_per_room <= 215.50 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- lead_time <= 91.50 | | | | | | | | |--- avg_price_per_room <= 170.27 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [873.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 170.27 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- lead_time > 91.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- lead_time <= 148.00 | | | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 148.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- avg_price_per_room <= 142.00 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 142.00 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- lead_time <= 60.50 | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [4.00, 1.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 60.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- avg_price_per_room > 215.50 | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- no_of_weekend_nights > 2.50 | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 9.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | |--- avg_price_per_room <= 167.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- avg_price_per_room <= 153.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 153.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 167.50 | | | | | | | | |--- avg_price_per_room <= 168.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 168.50 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 181.17 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 181.17 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- weights: [48.00, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_month <= 9.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- arrival_date <= 12.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 94.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 94.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- arrival_date <= 6.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 6.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 12.50 | | | | | | | | | |--- avg_price_per_room <= 66.40 | | | | | | | | | | |--- avg_price_per_room <= 65.45 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 65.45 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 66.40 | | | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | |--- no_of_children <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- no_of_children > 5.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- arrival_month > 9.50 | | | | | | | |--- avg_price_per_room <= 126.33 | | | | | | | | |--- weights: [102.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 126.33 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 153.10 | | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 153.10 | | | | | | | | | | |--- avg_price_per_room <= 176.88 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 176.88 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | |--- arrival_date <= 10.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 10.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | |--- lead_time > 9.50 | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | |--- avg_price_per_room <= 118.55 | | | | | | | |--- lead_time <= 59.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [100.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 66.30 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 66.30 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [165.00, 0.00] class: 0 | | | | | | | |--- lead_time > 59.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- arrival_date <= 25.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 25.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- avg_price_per_room <= 77.85 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- avg_price_per_room > 77.85 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | |--- avg_price_per_room > 118.55 | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | | |--- lead_time <= 142.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- lead_time > 142.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | | |--- avg_price_per_room <= 121.38 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 121.38 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- weights: [79.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | |--- weights: [118.00, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | |--- lead_time <= 108.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- lead_time <= 50.50 | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | | | |--- lead_time > 50.50 | | | | | | | | | | |--- lead_time <= 85.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 85.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- lead_time <= 33.50 | | | | | | | | | | |--- no_of_week_nights <= 9.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 9.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 33.50 | | | | | | | | | | |--- avg_price_per_room <= 144.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 144.12 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- lead_time > 108.50 | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 89.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2096.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- lead_time <= 9.50 | | | | | | | | | |--- avg_price_per_room <= 133.58 | | | | | | | | | | |--- weights: [33.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 133.58 | | | | | | | | | | |--- avg_price_per_room <= 141.60 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 141.60 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 9.50 | | | | | | | | | |--- lead_time <= 19.50 | | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 19.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [61.00, 0.00] class: 0 | | | |--- lead_time > 89.50 | | | | |--- avg_price_per_room <= 202.14 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- lead_time <= 150.50 | | | | | | | | |--- no_of_week_nights <= 7.00 | | | | | | | | | |--- lead_time <= 95.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 95.50 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- no_of_week_nights > 7.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- lead_time > 150.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 95.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- lead_time <= 93.50 | | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- lead_time > 93.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- avg_price_per_room <= 134.91 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 134.91 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- lead_time > 95.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- lead_time <= 130.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 130.00 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [56.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 202.14 | | | | | |--- weights: [0.00, 8.00] class: 1 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- no_of_adults <= 1.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- lead_time <= 162.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- lead_time > 162.50 | | | | | | | |--- weights: [0.00, 16.00] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- weights: [60.00, 6.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- avg_price_per_room <= 70.85 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 70.85 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | |--- lead_time <= 231.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 231.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- avg_price_per_room <= 75.00 | | | | | | | | |--- arrival_month <= 7.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_month > 7.00 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 75.00 | | | | | | | | |--- avg_price_per_room <= 88.00 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | |--- avg_price_per_room > 88.00 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- lead_time <= 347.50 | | | | | | | | | | | |--- weights: [3.00, 3.00] class: 0 | | | | | | | | | | |--- lead_time > 347.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [1.00, 3.00] class: 1 | | | | |--- no_of_adults > 1.50 | | | | | |--- avg_price_per_room <= 83.72 | | | | | | |--- lead_time <= 244.00 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [0.00, 30.00] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- lead_time <= 203.50 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 203.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 66.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 64.80 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 64.80 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 27.77 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 27.77 | | | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 66.50 | | | | | | | | | |--- avg_price_per_room <= 81.76 | | | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 81.76 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 244.00 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [31.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [36.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 83.72 | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- weights: [0.00, 216.00] class: 1 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 94.52 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 94.52 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | |--- arrival_month > 6.00 | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_per_room <= 8.00 | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | |--- arrival_date <= 19.50 | | | | | | | |--- lead_time <= 205.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 205.00 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- arrival_date > 19.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | |--- arrival_date <= 8.50 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | |--- arrival_date > 8.50 | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 8.00 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 535.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- lead_time <= 254.50 | | | | | | | | | |--- lead_time <= 210.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- lead_time > 210.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 254.50 | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- weights: [0.00, 54.00] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- no_of_weekend_nights <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- lead_time > 159.50 | | | | | | |--- arrival_date <= 1.50 | | | | | | | |--- lead_time <= 176.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- lead_time > 176.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- arrival_date > 1.50 | | | | | | | |--- no_of_adults <= 0.50 | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_adults > 0.50 | | | | | | | | |--- weights: [50.00, 0.00] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- lead_time <= 336.00 | | | | | | | | | |--- weights: [0.00, 117.00] class: 1 | | | | | | | | |--- lead_time > 336.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 300.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- lead_time <= 221.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 221.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time > 300.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- no_of_adults <= 2.50 | | | | | | | |--- lead_time <= 298.00 | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | |--- lead_time > 298.00 | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_adults > 2.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- no_of_weekend_nights > 0.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 348.50 | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | |--- weights: [149.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.50 | | | | | | | | |--- no_of_week_nights <= 3.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.00 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- lead_time > 348.50 | | | | | | |--- lead_time <= 372.50 | | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | | |--- weights: [4.00, 2.00] class: 0 | | | | | | |--- lead_time > 372.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- avg_price_per_room <= 76.48 | | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | | |--- lead_time <= 289.00 | | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | | |--- weights: [51.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | | |--- lead_time <= 245.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 245.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 289.00 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- avg_price_per_room > 76.48 | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | |--- lead_time <= 248.50 | | | | | | | | | |--- lead_time <= 187.50 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- lead_time > 187.50 | | | | | | | | | | |--- avg_price_per_room <= 86.62 | | | | | | | | | | | |--- weights: [35.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 86.62 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | |--- lead_time > 248.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time <= 252.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- lead_time > 252.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | |--- lead_time <= 204.50 | | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time > 204.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- arrival_date <= 29.50 | | | | | | | |--- no_of_week_nights <= 8.00 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- lead_time <= 245.50 | | | | | | | | | | |--- arrival_date <= 28.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 28.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 245.50 | | | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 8.00 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- arrival_date > 29.50 | | | | | | | |--- weights: [0.00, 5.00] class: 1 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- weights: [0.00, 2135.00] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [26.00, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- weights: [53.00, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.00 | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | |--- arrival_date > 24.00 | | | | | |--- lead_time <= 172.50 | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | |--- lead_time > 172.50 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_children > 0.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- weights: [0.00, 8.00] class: 1
# looking at Gini importance of the columns
print(
pd.DataFrame(
dTree.feature_importances_, columns=["Imp"], index=XD_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.358356 avg_price_per_room 0.172766 market_segment_type_Online 0.092207 arrival_date 0.079548 no_of_special_requests 0.067472 arrival_month 0.062923 no_of_week_nights 0.043018 no_of_weekend_nights 0.036916 no_of_adults 0.027805 arrival_year 0.013883 room_type_reserved_Room_Type 4 0.008165 type_of_meal_plan_Not Selected 0.007607 required_car_parking_space 0.006833 type_of_meal_plan_Meal Plan 2 0.006009 no_of_children 0.005847 market_segment_type_Offline 0.003194 room_type_reserved_Room_Type 2 0.002290 room_type_reserved_Room_Type 5 0.001362 repeated_guest 0.001229 no_of_previous_bookings_not_canceled 0.000697 room_type_reserved_Room_Type 6 0.000503 market_segment_type_Corporate 0.000488 room_type_reserved_Room_Type 7 0.000396 market_segment_type_Complementary 0.000182 type_of_meal_plan_Meal Plan 3 0.000181 no_of_previous_cancellations 0.000121 room_type_reserved_Room_Type 3 0.000000
# Generating a plot of features according to their importance
importances = dTree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="RED", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Creating a decision tree with max depth of 3
dTree1 = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=1)
dTree1.fit(XD_train, yD_train)
DecisionTreeClassifier(max_depth=3, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(max_depth=3, random_state=1)
# Creating a confusion matrix of the new tree
make_confusion_matrix(dTree1, yD_test)
# Accuracy on train and test
print("Accuracy on training set : ", dTree1.score(XD_train, yD_train))
print("Accuracy on test set : ", dTree1.score(XD_test, yD_test))
# Recall on train and test
get_recall_score(dTree1)
Accuracy on training set : 0.7869698880655841 Accuracy on test set : 0.7855434982527129 Recall on training set : 0.7343469412187238 Recall on test set : 0.7315759637188208
plt.figure(figsize=(15, 10))
tree.plot_tree(
dTree1,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
plt.show()
# Printing the rules of a decision tree -
print(tree.export_text(dTree1, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [4628.00, 789.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [2490.00, 2745.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- weights: [5666.00, 1034.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- weights: [2881.00, 148.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- weights: [670.00, 1254.00] class: 1 | | |--- no_of_special_requests > 0.50 | | | |--- weights: [596.00, 236.00] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- weights: [26.00, 2135.00] class: 1 | | |--- arrival_month > 11.50 | | | |--- weights: [62.00, 12.00] class: 0
# looking at Gini importance of the columns
print(
pd.DataFrame(
dTree1.feature_importances_, columns=["Imp"], index=XD_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.504209 market_segment_type_Online 0.194143 no_of_special_requests 0.166143 avg_price_per_room 0.110694 arrival_month 0.024811 no_of_week_nights 0.000000 type_of_meal_plan_Not Selected 0.000000 market_segment_type_Offline 0.000000 market_segment_type_Corporate 0.000000 market_segment_type_Complementary 0.000000 room_type_reserved_Room_Type 7 0.000000 room_type_reserved_Room_Type 6 0.000000 room_type_reserved_Room_Type 5 0.000000 room_type_reserved_Room_Type 4 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 2 0.000000 type_of_meal_plan_Meal Plan 3 0.000000 required_car_parking_space 0.000000 type_of_meal_plan_Meal Plan 2 0.000000 no_of_children 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 repeated_guest 0.000000 arrival_date 0.000000 arrival_year 0.000000 no_of_weekend_nights 0.000000 no_of_adults 0.000000
# Generating a plot of features according to their importance
importances = dTree1.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(10, 10))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Creating a decision tree with max depth of 16
dTree2 = DecisionTreeClassifier(criterion="gini", max_depth=16, random_state=1)
dTree2.fit(XD_train, yD_train)
DecisionTreeClassifier(max_depth=16, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(max_depth=16, random_state=1)
# Creating a confusion matrix of the new tree
make_confusion_matrix(dTree2, yD_test)
# Accuracy on train and test
print("Accuracy on training set : ", dTree2.score(XD_train, yD_train))
print("Accuracy on test set : ", dTree2.score(XD_test, yD_test))
# Recall on train and test
get_recall_score(dTree2)
Accuracy on training set : 0.9398943717483841 Accuracy on test set : 0.8740114033474342 Recall on training set : 0.9002753501735903 Recall on test set : 0.8072562358276644
# Visualizing the tree
plt.figure(figsize=(40, 50))
tree.plot_tree(
dTree2,
feature_names=feature_names,
filled=True,
fontsize=7,
node_ids=True,
class_names=True,
)
plt.show()
# looking at Gini importance of the columns
print(
pd.DataFrame(
dTree2.feature_importances_, columns=["Imp"], index=XD_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.379431 avg_price_per_room 0.162222 market_segment_type_Online 0.112882 no_of_special_requests 0.082034 arrival_month 0.062262 arrival_date 0.056201 no_of_weekend_nights 0.033177 no_of_week_nights 0.031673 no_of_adults 0.027066 arrival_year 0.015611 required_car_parking_space 0.007700 type_of_meal_plan_Meal Plan 2 0.006440 type_of_meal_plan_Not Selected 0.005266 room_type_reserved_Room_Type 4 0.004607 market_segment_type_Offline 0.003760 no_of_children 0.003306 room_type_reserved_Room_Type 2 0.002643 repeated_guest 0.001414 room_type_reserved_Room_Type 5 0.001244 no_of_previous_bookings_not_canceled 0.000311 room_type_reserved_Room_Type 6 0.000305 market_segment_type_Corporate 0.000223 type_of_meal_plan_Meal Plan 3 0.000222 no_of_previous_cancellations 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 7 0.000000 market_segment_type_Complementary 0.000000
# Generating a plot of features according to their importance
importances = dTree2.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(10, 10))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="purple", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Creating a dataframe for cost complexity alphas and impurities for the training data
clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(XD_train, yD_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000 | 0.007381 |
| 1 | 0.000000 | 0.007381 |
| 2 | 0.000000 | 0.007381 |
| 3 | 0.000000 | 0.007381 |
| 4 | 0.000000 | 0.007381 |
| ... | ... | ... |
| 1344 | 0.006970 | 0.285544 |
| 1345 | 0.012982 | 0.298526 |
| 1346 | 0.017160 | 0.315686 |
| 1347 | 0.023910 | 0.363506 |
| 1348 | 0.078164 | 0.441669 |
1349 rows × 2 columns
# Creating a plot of leaf impurity vs effective alphas of the decision tree
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post", color="r")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
# Training the decision tree using generated alphas
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
clf.fit(XD_train, yD_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.07816367532161222
# Genrating plots of Number of nodes against alpha value and depth of the decision tree against alpha
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
train_scores = [clf.score(XD_train, yD_train) for clf in clfs]
test_scores = [clf.score(XD_test, yD_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(10, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("accuracy")
ax.set_title("Accuracy vs alpha for training and testing sets")
ax.plot(ccp_alphas, train_scores, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, test_scores, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
index_best_model = np.argmax(test_scores)
best_model = clfs[index_best_model]
print(best_model)
print("Training accuracy of best model: ", best_model.score(XD_train, yD_train))
print("Test accuracy of best model: ", best_model.score(XD_test, yD_test))
DecisionTreeClassifier(ccp_alpha=4.2041095170529206e-05, random_state=1) Training accuracy of best model: 0.9731988018287876 Test accuracy of best model: 0.8792532646680155
# Generating data of recall vs alphas for training and testing data
recall_train = []
for clf in clfs:
pred_trainD = clf.predict(XD_train)
values_train = metrics.recall_score(yD_train, pred_trainD)
recall_train.append(values_train)
recall_test = []
for clf in clfs:
pred_testD = clf.predict(XD_test)
values_test = metrics.recall_score(yD_test, pred_testD)
recall_test.append(values_test)
# Plotting for recall vs alpha for training and testing data
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=2.3648116033422675e-05, random_state=1)
# Creating a confusion matrix for this model
make_confusion_matrix(best_model, yD_test)
# Recall on train and test
get_recall_score(best_model)
Recall on training set : 0.9867113611875973 Recall on test set : 0.8129251700680272
# Visualizing the CCP decision tree
plt.figure(figsize=(17, 15))
tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
plt.show()
# looking at Gini importance of the columns
print(
pd.DataFrame(
best_model.feature_importances_, columns=["Imp"], index=XD_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.358882 avg_price_per_room 0.172550 market_segment_type_Online 0.092650 arrival_date 0.079076 no_of_special_requests 0.067781 arrival_month 0.063057 no_of_week_nights 0.042555 no_of_weekend_nights 0.036658 no_of_adults 0.027723 arrival_year 0.013868 room_type_reserved_Room_Type 4 0.008159 type_of_meal_plan_Not Selected 0.007644 required_car_parking_space 0.006835 type_of_meal_plan_Meal Plan 2 0.006038 no_of_children 0.005875 market_segment_type_Offline 0.003210 room_type_reserved_Room_Type 2 0.002301 room_type_reserved_Room_Type 5 0.001369 repeated_guest 0.001235 no_of_previous_bookings_not_canceled 0.000700 room_type_reserved_Room_Type 6 0.000505 market_segment_type_Corporate 0.000445 room_type_reserved_Room_Type 7 0.000398 market_segment_type_Complementary 0.000182 type_of_meal_plan_Meal Plan 3 0.000182 no_of_previous_cancellations 0.000122 room_type_reserved_Room_Type 3 0.000000
# Generating a plot of features according to their importance
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="lime", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
comparison_frame = pd.DataFrame(
{
"Model": [
"Unpruned decision tree model",
"Decision tree with maximum depth = 3",
"Decision tree with maximum depth = 16",
"Decision tree with Cost Complexity Pruning",
],
"Train_Recall": [0.99, 0.73, 0.90, 0.99],
"Test_Recall": [0.81, 0.73, 0.81, 0.81],
}
)
comparison_frame
| Model | Train_Recall | Test_Recall | |
|---|---|---|---|
| 0 | Unpruned decision tree model | 0.99 | 0.81 |
| 1 | Decision tree with maximum depth = 3 | 0.73 | 0.73 |
| 2 | Decision tree with maximum depth = 16 | 0.90 | 0.81 |
| 3 | Decision tree with Cost Complexity Pruning | 0.99 | 0.81 |